Logistic/Probit regression

linear probability model#

$E(y|x)=x'\beta$

\begin{aligned} &P(y=1|x)=F(x'\beta)\\ &P(y=0|x)=1- F(x'\beta) \end{aligned}

Logistic distribution CDF is in the form of a logistic function. $F(x)=\frac {1}{1+e^{-(x-\mu )/s}}$ Logistic regression assumption:

P(Y=1 \mid X)=\frac{1}{1+\exp \left(w_{0}+\sum_{i=1}^{n} w_{i} X_{i}\right)}

latent-score-interpretation

$y=1 \quad \text{iff } y^{*}=x'w+\epsilon >0.$ $\epsilon \sim Logit$ .

\begin{aligned} P(y=1|x)&=P(y^* >0|x)\\ &=P(\epsilon <x'w |x)\\ &=F_{\epsilon}(x'w) \end{aligned}

$\text{Odd}=\frac{P(y=1|x)}{P(y=0|x)}=e^{x'w}$

w represents the marginal effect of x on the log of the odd.

\hat{W}=\underset{W}{\arg \max } \sum_{l} \ln P\left(Y^{l} \mid X^{l}, W\right)

From the view of machine learning the mle object function could be represented by log-loss cost function with further assumption on penalty terms.

\begin{aligned} \hat{W} &= \underset{W}{\arg \min } - C\sum_{l} \ln P\left(Y^{l} \mid X^{l}, W\right) + \|W\| \\ & = \underset{W}{\arg \min } -C\sum_{l} Y^l ln(P(Y^l)) + (1-Y^l) ln(1-P(Y^l))+ \|W\| \end{aligned}

from the latent score interpretation, $y_1 = 1_{y_1^* >0}$ structure model $(u,v)$ are bivariate normal, $var(u)=1$ ,

\begin{aligned} &y_1^* = z_1'\delta_1 +\alpha_1 y_2 +u;\\ &y_2=z_2' \delta_2 +v;\\ & u=\theta v+ e \end{aligned}

$y_2$ correlated with u via v.

We have

$var(e)=1-\frac{cov(v,u)^2}{var(v)var(u)}=1-\rho^2$

By GLS $\tilde{e}\sim N(0,1)$ ;

\begin{aligned} &y_1^* = z_1'\delta_1 +y_2\alpha_1 + v \theta+ e\\ &y_1^*/(\sqrt{1-\rho^2}) = z_1'\delta_1/(\sqrt{1-\rho^2}) +y_2\alpha_1/(\sqrt{1-\rho^2}) + v \theta/(\sqrt{1-\rho^2})+ e/(\sqrt{1-\rho^2})\\ &\tilde{y}_1^* = z_1'\tilde{\delta}_1 +y_2\tilde{\alpha}_1 + v\tilde{\theta}+ \tilde{e}\\ \end{aligned}

Then the probit model reduced to:

$P(y=1|z_1,y_2,v)=P(\tilde{y}^*>0|z_1,y_2,v)=\Phi(z_1'\tilde{\delta}_1+\tilde{\alpha}y_2+\tilde{\theta}v)$ 2-step

$y_{it}^{*} =x_{it}'\beta+\alpha_i+\varepsilon$